NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BI‐LAVA: Biocuration With Hierarchical Image Labelling Through Active Learning and Visual Analytics

https://doi.org/10.1111/cgf.15261

Trelles, Juan; Wentzel, Andrew; Berrios, William; Shatkay, Hagit; Marai, G Elisabeta (February 2025, Computer Graphics Forum)

Abstract In the biomedical domain, taxonomies organize the acquisition modalities of scientific images in hierarchical structures. Such taxonomies leverage large sets of correct image labels and provide essential information about the importance of a scientific publication, which could then be used in biocuration tasks. However, the hierarchical nature of the labels, the overhead of processing images, the absence or incompleteness of labelled data and the expertise required to label this type of data impede the creation of useful datasets for biocuration. From a multi‐year collaboration with biocurators and text‐mining researchers, we derive an iterative visual analytics and active learning (AL) strategy to address these challenges. We implement this strategy in a system called BI‐LAVA—Biocuration with Hierarchical Image Labelling through Active Learning and Visual Analytics. BI‐LAVA leverages a small set of image labels, a hierarchical set of image classifiers and AL to help model builders deal with incomplete ground‐truth labels, target a hierarchical taxonomy of image modalities and classify a large pool of unlabelled images. BI‐LAVA's front end uses custom encodings to represent data distributions, taxonomies, image projections and neighbourhoods of image thumbnails, which help model builders explore an unfamiliar image dataset and taxonomy and correct and generate labels. An evaluation with machine learning practitioners shows that our mixed human–machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy, as well as validating and improving data quality in labelled and unlabelled collections.
more » « less
Free, publicly-accessible full text available February 1, 2026
MouseScholar: Evaluating an Image+Text Search System for Biocuration

https://doi.org/10.1109/BIBM58861.2023.10385503

Trabucco, Juan Trelles; Floricel, Carla; Arighi, Cecilia; Shatkay, Hagit; Raciti, Daniela; Ringwald, Martin; Marai, G Elisabeta (December 2023, IEEE Xplore)

Biocuration is the process of analyzing biological or biomedical articles to organize biological data into data repositories using taxonomies and ontologies. Due to the expanding number of articles and the relatively small number of biocurators, automation is desired to improve the workflow of assessing articles worth curating. As figures convey essential information, automatically integrating images may improve curation. In this work, we instantiate and evaluate a first-in-kind, hybrid image+text document search system for biocuration. The system, MouseScholar, leverages an image modality taxonomy derived in collaboration with biocurators, in addition to figure segmentation, and classifiers components as a back-end and a streamlined front-end interface to search and present document results. We formally evaluated the system with ten biocurators on a mouse genome informatics biocuration dataset and collected feedback. The results demonstrate the benefits of blending text and image information when presenting scientific articles for biocuration.
more » « less
Full Text Available
Enhancing biomedical search interfaces with images

https://doi.org/10.1093/bioadv/vbad095

Trelles Trabucco, Juan; Arighi, Cecilia; Shatkay, Hagit; Marai, G. Elisabeta; Lengauer, ed., Thomas (July 2023, Bioinformatics Advances)

Abstract MotivationFigures in biomedical papers communicate essential information with the potential to identify relevant documents in biomedical and clinical settings. However, academic search interfaces mainly search over text fields. ResultsWe describe a search system for biomedical documents that leverages image modalities and an existing index server. We integrate a problem-specific taxonomy of image modalities and image-based data into a custom search system. Our solution features a front-end interface to enhance classical document search results with image-related data, including page thumbnails, figures, captions and image-modality information. We demonstrate the system on a subset of the CORD-19 document collection. A quantitative evaluation demonstrates higher precision and recall for biomedical document retrieval. A qualitative evaluation with domain experts further highlights our solution’s benefits to biomedical search. Availability and implementationA demonstration is available at https://runachay.evl.uic.edu/scholar. Our code and image models can be accessed via github.com/uic-evl/bio-search. The dataset is continuously expanded.
more » « less
Domain-Informed Neural Networks for Interaction Localization Within Astroparticle Experiments

https://doi.org/10.3389/frai.2022.832909

Liang, Shixiao; Higuera, Aaron; Peters, Christina; Roy, Venkat; Bajwa, Waheed U.; Shatkay, Hagit; Tunnell, Christopher D. (June 2022, Frontiers in Artificial Intelligence)

This work proposes a domain-informed neural network architecture for experimental particle physics, using particle interaction localization with the time-projection chamber (TPC) technology for dark matter research as an example application. A key feature of the signals generated within the TPC is that they allow localization of particle interactions through a process called reconstruction (i.e., inverse-problem regression). While multilayer perceptrons (MLPs) have emerged as a leading contender for reconstruction in TPCs, such a black-box approach does not reflect prior knowledge of the underlying scientific processes. This paper looks anew at neural network-based interaction localization and encodes prior detector knowledge, in terms of both signal characteristics and detector geometry, into the feature encoding and the output layers of a multilayer (deep) neural network. The resulting neural network, termed Domain-informed Neural Network (DiNN), limits the receptive fields of the neurons in the initial feature encoding layers in order to account for the spatially localized nature of the signals produced within the TPC. This aspect of the DiNN, which has similarities with the emerging area of graph neural networks in that the neurons in the initial layers only connect to a handful of neurons in their succeeding layer, significantly reduces the number of parameters in the network in comparison to an MLP. In addition, in order to account for the detector geometry, the output layers of the network are modified using two geometric transformations to ensure the DiNN produces localizations within the interior of the detector. The end result is a neural network architecture that has 60% fewer parameters than an MLP, but that still achieves similar localization performance and provides a path to future architectural developments with improved performance because of their ability to encode additional domain knowledge into the architecture.
more » « less
Full Text Available
Modality-Classification of Microscopy Images Using Shallow Variants of Deep Networks

https://doi.org/10.1109/BIBM49941.2020.9313467

Trabucco, Juan Trelles; Li, Pengyuan; Arighi, Cecilia; Shatkay, Hagit; Marai, G. Elisabeta (December 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Full Text Available
Stochastic Sequential Modeling: Toward Improved Prostate Cancer Diagnosis Through Temporal-Ultrasound

https://doi.org/10.1007/s10439-020-02585-y

Nahlawi, Layan; Imani, Farhad; Gaed, Mena; Gomez, Jose A.; Moussa, Madeleine; Gibson, Eli; Fenster, Aaron; Ward, Aaron; Abolmaesumi, Purang; Mousavi, Parvin; et al (August 2020, Annals of Biomedical Engineering)

Full Text Available
Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of snomed codes

https://doi.org/10.1016/j.jbi.2018.04.008

Bhattacharya, Moumita; Jurkovitz, Claudine; Shatkay, Hagit (June 2018, Journal of Biomedical Informatics)

Full Text Available
Translational Knowledge Discovery Between Drug Interactions and Pharmacogenetics

https://doi.org/10.1002/cpt.1745

Wu, Heng‐Yi; Shendre, Aditi; Zhang, Shijun; Zhang, Pengyue; Wang, Lei; Zeruesenay, Desta; Rocha, Luis M.; Shatkay, Hagit; Quinney, Sara K.; Ning, Xia; et al (April 2020, Clinical Pharmacology & Therapeutics)

Full Text Available
Assessing chronic kidney disease from office visit records using hierarchical meta-classification of an imbalanced dataset

https://doi.org/10.1109/BIBM.2017.8217733

Bhattacharya, Moumita; Jurkovitz, Claudine; Shatkay, Hagit (November 2017, The IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Full Text Available
Identifying articles relevant to drug-drug interaction: Addressing class imbalance

https://doi.org/10.1109/BIBM.2017.8217818

Zhang, Gongbo; Bhattacharya, Moumita; Wu, Heng-Yi; Li, Pengyuan; Li, Lang; Shatkay, Hagit (November 2017, The IEEE International Conference on Bioinformatics and Biomedicine)

Full Text Available

« Prev Next »

Search for: All records